On overfitting, generalization, and randomly expanded training sets

نویسندگان

  • George N. Karystinos
  • Dimitris A. Pados
چکیده

An algorithmic procedure is developed for the random expansion of a given training set to combat overfitting and improve the generalization ability of backpropagation trained multilayer perceptrons (MLPs). The training set is K-means clustered and locally most entropic colored Gaussian joint input-output probability density function (pdf) estimates are formed per cluster. The number of clusters is chosen such that the resulting overall colored Gaussian mixture exhibits minimum differential entropy upon global cross-validated shaping. Numerical studies on real data and synthetic data examples drawn from the literature illustrate and support these theoretical developments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Avoiding Boosting Overfitting by Removing Confusing Samples

Boosting methods are known to exhibit noticeable overfitting on some datasets, while being immune to overfitting on other ones. In this paper we show that standard boosting algorithms are not appropriate in case of overlapping classes. This inadequateness is likely to be the major source of boosting overfitting while working with real world data. To verify our conclusion we use the fact that an...

متن کامل

Deriving the Kernel from Training Data

In this paper we propose a strategy for constructing data– driven kernels, automatically determined by the training examples. Basically, their associated Reproducing Kernel Hilbert Spaces arise from finite sets of linearly independent functions, that can be interpreted as weak classifiers or regressors, learned from training material. When working in the Tikhonov regularization framework, the u...

متن کامل

A Fast Scheme for Feature Subset Selection to Avoid Overfitting in AdaBoost

AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We show that with the introduction of a scoring function and the random selection of training data it is possible to create a smaller set of feature vectors. The selection of th...

متن کامل

GraphConnect: A Regularization Framework for Neural Networks

Deep neural networks have proved very successful in domains where large training sets are available, but when the number of training samples is small, their performance suffers from overfitting. Prior methods of reducing overfitting such as weight decay, Dropout and DropConnect are data-independent. This paper proposes a new method, GraphConnect, that is data-dependent, and is motivated by the ...

متن کامل

Dynamics of Supervised Learning with Restricted Training Sets and Noisy Teachers

We generalize a recent formalism to describe the dynamics of supervised learning in layered neural networks, in the regime where data recycling is inevitable, to the case of noisy teachers. Our theory generates predictions for the evolution in time of trainingand generalization errors, and extends the class of mathematically solvable learning processes in large neural networks to those complica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE transactions on neural networks

دوره 11 5  شماره 

صفحات  -

تاریخ انتشار 2000